0704-883-0675     |      dataprojectng@gmail.com

CLUSTERING NEWS ARTICLES USING K-MEANS AND N-GRAMS

  • Project Research
  • 1-5 Chapters
  • Abstract : Available
  • Table of Content: Available
  • Reference Style: APA
  • Recommended for : Student Researchers
  • NGN 3000

ABSTRACT

Document clustering is an automatic unsupervised machine learning technique that aimed at grouping related set of items into clusters or subsets. The target is creating clusters with high internal coherence, but different from each other substantially. Simply, items within the same cluster should be highly similar, while maintaining high dissimilarity with items within other clusters. Automatic clustering of documents has played a very significant role in many fields including data mining and information retrieval. This thesis aimed to improve the overall efficiency of a document clustering technique using N-grams and efficient similarity measure. The thesis improves the purity and accuracy of the obtained clusters. The preprocessing method is based on N-grams (sequence of N consecutive characters) which do not give consideration to stop-words or other special punctuations but creates and overlap among the content of a document which further gives room to ignore errors thereby increasing the quality of the clusters to a great extent. This approach clusters the news articles based on their N-grams representation, thereby reducing noise and increase the probability of occurrences of the sequences within the articles document. The proposed clustering technique has parameters which can be changed accordingly at the document representation level in order to improve the efficiency and quality of the generated clusters. The results from the experiment using R programming environment were carried out on real datasets of the Reuters21578 and 20Newsgropus proved the effectiveness of the proposed clustering technique at different levels of N-grams in terms of the accuracy and purity of the generated clusters. The results also showed that the proposed clustering technique perform averagely better than the baseline technique both in terms of accuracy and purity with a best results when the window of N-grams = 3.




FIND OTHER RELATED TOPICS


Related Project Materials

IMPACT OF TEACHER QUALITY ON ACADEMIC PERFORMANCE OF STUDENTS IN HAUSA LANGUAGE SENIOR SECONDARY CERTIFICATE EXAMINATIONS

ABSTRACT

This study investigated the “Impact of Teacher Quality on the Academic Performance of Students in Hausa Language Senior Se...

Read more
THE IMPACT OF PRIVATIZATION AND COMMERCIALIZATION OF PUBLIC ENTERPRISES ON ECONOMIC GROWTH OF NIGERIA

ABSTRACT

So much effort has been made towards understanding the relationship between privatization and commercialization and the economic...

Read more
IMPACT OF COMPUTER ON THE PRACTICE OF JOURNALISM

Background of the Study

Journalism is the investigation and reporting of events, issues and trends to b...

Read more
CAUSES AND EFFECT OF CONTAMINATED WATER ON HUMAN HEALTH

CHAPTER ONE

INTRODUCTION

1.1

EFFECTS OF COGNITIVE BEHAVIOUR AND SOCIAL LEARNING THERAPIES ON MANAGING ADOLESCENTS AGGRESSIVENESS

ABSTRACT

The work Effects of cognitive behaviour and social learning therapies on managing adolescents aggressiveness in...

Read more
AN APPRAISAL OF THE PROBLEMS AND PROSPECTS OF CAPITAL GAINS TAX ACT, 2004 IN NIGERIA

ABSTRACT

The Capital Gains Tax was introduced into Nigeria by the Capital Gains Tax Decree (Decree No. 44) in 1967. The decree was enacte...

Read more
THE IMPACT OF CAREER ACADEMIES ON VOCATIONAL STUDENT OUTCOMES

ABSTRACT: The Impact of Career Academies on Vocational Student Outcomes is a vital area of research for understanding their role in enhancing...

Read more
THE ROLE OF GATEKEEPING IN NEWS PRODUCTION AND DISSEMINATION

Abstract

Before information can be disseminated to the public, it undergoes strict examination by certain people. These people are known...

Read more
ASSESSMENT OF AWARENESS AND PRACTICE FOR PROMOTION OF MENTAL HEALTH AMONG STUDENTS

    1. Background of the study

According to the WHO definition (2...

Read more
EFFECT OF DIGITAL NATIVE ADVERTISING ON THE PROMOTION OF GOOD HEALTH HABITS AMONG COSMETICS AND PERSONAL CARE PRODUCTS USERS IN NIGERIA

BACKGROUND OF THE STUDY

The formation of a business, including the cosmetics sector, necessitates criti...

Read more
Share this page with your friends




whatsapp